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CN Abstract 

^ This paper proposes a novel framework for multigroup shape analysis relying on a hierarchical graphical statistical model on 

, '•^ shapes within a population. The framework represents individual shapes as pointsets modulo translation, rotation, and scale, following 
the notion in Kendall's shape space. While individual shapes are derived from their group shape model, each group shape model is 
derived from a single population shape model. The hierarchical model follows the natural organization of population data and the 
' top level in the hierarchy provides a common frame of reference for multigroup shape analysis, e.g. classification and hypothesis 
testing. Unlike typical shape-modeling approaches, the proposed model is a generative model that defines a joint distribution of 
object-boundary data and the shape-model variables. Furthermore, it naturally enforces optimal correspondences during the process 
!r > of model fitting and thereby subsumes the so-called correspondence problem. The proposed inference scheme employs an expectation 
maximization (EM) algorithm that treats the individual and group shape variables as hidden random variables and integrates them out 
C/3 before estimating the parameters (population mean and variance and the group variances). The underpinning of the EM algorithm is 
the sampling of pointsets, in Kendall's shape space, from their posterior distribution, for which we exploit a highly-efficient scheme 
based on Hamiltonian Monte Carlo simulation. Experiments in this paper use the fitted hierarchical model to perform (1) hypothesis 
testing for comparison between pairs of groups using permutation testing and (2) classification for image retrieval. The paper validates 
^ the proposed framework on simulated data and demonstrates results on real data. 

o 

r4 Introduction and Related Work 
in 

p^^hape modeling and analysis is an important problem in a variety of fields including biology, medical image analysis, and computer 
^~^ision|[Tl |7] |9l [12 1 that has received considerable attention over the last few decades. Kendall|12| defines shape as an equivalence 
C^l ass of pointsets under the similarities generated by translation, rotation, and scaling. Objects in biological images or anatomical 
. structures in medical images often possess shape as the sole identifying characteristic instead of color or texture. Applications of 
K%hape analysis beyond natural contexts include handwriting analysis and character recognition. In the medical context, shapes of 
kJluman anatomical structures can provide crucial cues in diagnosis of pathologies or disorders. The key problems in this context lie 
Jjh the fitting of shape models to population image data followed by statistical analysis such as hypothesis testing to compare groups, 
C^lassification of unseen shapes in one of the groups for which the shape models are learnt, or unsupervised clustering of shapes. 

A variety of applications of shape analysis deal with population data that naturally calls for a hierarchical modeling approach. 
First, the hierarchical model follows the natural organization or structure of the data. For instance, as in Figure[T] the model variables 
at the top level capture statistical properties of the population (e.g., individuals with and without medical conditions), the model 
variables at a lower level capture statistical properties of different groups within a population (e.g., clinical cohorts based on gender 
or age or type of disease within a spectrum disorder), and the model variables at the lowest level capture individual properties, which 
finally relate to the observed data. Second, the top-level population variables are necessary to enable comparison (through hypothesis 
testing or classification) between the groups within the population because the population shape variable helps provide a common 
reference frame (e.g., a grand mean) for the shape models underlying different groups. 

Previous works on "hierarchical" (overloaded term) shape modeling concern (i) multi-resolution/scale models e.g., face model 
at fine-to-coarse resolutions) or (ii) multi-part models (e.g., car decomposed into body, tires, hood, trunk) with inter-part spatial 
relationships. In contrast, our proposed model is the first hierarchical probabilistic model comprising a population with multiple 
groups (e.g., vehicle population comprising sedans, trucks, buses). 

In the context of image-based applications, shape models are constructed from image-derived data that is typically in the form 
of a set of pixels lying close to the object boundary detected in the set of images; each pointset has the same number of points 
(also interpreted as landmarks^). To obtain valid and useful shape models from the data, it is crucial to determine (i) the positions 
of the points, within each pointset, in relationship to the object boundary in the image, and (ii) meaningful correspondences\l \ of 
points across pointsets in the group and the population. Some of the early approaches to shape modeling ||3ji9j assumed knowledge 
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Figure 1 : Hierarchical Modeling of Multiple Groups of Sliapes in a Population. 



of a set of homologous landmarks, typically manually defined, comprising the pointset for each object in the population. They 
proposed methods for learning Gaussian shape models for such landmark data observed for a population. However, for shapes that 
are complex or 3 dimensional, manually defining landmarks can be infeasible and inaccurate. Thus, there is a need to include the 
pointsets, describing each individual shape in the population, as variables in an optimization process. Several approaches to the 
correspondence problem attempt to find the simplest explanation for the point positions, including methods based on the logarithm 
of the determinant of the model covariance matrix|13|, minimum description lengthll5l 1211 . and minimum entropy|2, 19|. While 
these model-building criteria are based on statistical properties of the pointsets, (i) none of these approaches use generative statistical 
models, (ii) they force correspondences in an adhoc manner by introducing adhoc terms in the objective function, and (iii) have 2 
independent stages where the first finds optimal correspondences and the second estimates model parameters. 

Some generative model approaches to shape have been introduced in the computer vision literature||4l[T0]|20l[T8]|23l, but these 
are defined relative to some pre-determined template shape with manually-placed landmarks. On the other hand, the proposed 
method learns shape models from populations of detected object boundaries in images, without any a priori definition of a template 
or known landmarks. Furthermore, it proposes a generative statistical model for hierarchical shape modeling that naturally enforces 
optimal correspondences as a result of model fitting. In this way, the proposed modeling and optimization framework subsumes the 
correspondence problem. 

This paper makes several contributions. First, it proposes a novel hierarchical generative statistical model for modeling shapes 
within a population. This model tightly couples each individual shape model to the observed data (i.e., the set of points on the object 
boundary detected in the image) by designing their joint distribution using the (nonlinear) current norm between pointsets ifTTl l22l . 
Second, the proposed optimization framework employs an expectation maximization (EM) algorithm|,6] that treats the individual and 
group shape variables as hidden random variables and integrates them out before estimating the parameters, which are population 
mean and variance and the group variances. In this way, the proposed Bayesian EM-based inference framework is an improvement 
over using the mode approximation for the hidden variables. Third, the underpinning of the EM algorithm is the sampling of shapes 
(within Kendall shape space) from their posterior PDFs, e.g., shape variables for individual data have posterior PDFs involving 
(i) likelihood PDFs designed using the (nonlinear) current norm and (ii) prior PDFs conditioned on the group shape mean. For this 
purpose, this paper exploits a novel highly-efficient sampling scheme relying on Hamiltonian Monte Carlo (HMC) simulation||8l 

[laiiEiiini. 

The remainder of the paper is organized as follows. Section[2]gives an overview of the proposed hierarchical statistical generative 
model for multigroup population population data. Section|3]presents the EM formulation for the optimization problem. It defines the 
hidden variables and the parameters and the joint PDF, the details of the expectation step for integrating over the hidden variables, 



involving the HMC approach, and the subsequent maximization step to optimize the parameters. Sections 
testing scheme, using permutation testing, for comparing any pairs of groups within the population. Section 
shape classification. Section |8] summarizes the paper 
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2 Hierarchical Multigroup Shape Model 



This section gives an overview of the proposed hierarchical statistical generative model for multigroup population population data, 
illustrated in Figure [T] 

Let the group of random variables X = model observed data where each (vector) random variable Xi is a set 

of observed (corrupted) points (representing shape) on the boundary of the structure. That is, Xi = {Xi{t)}t=i^....T where 
Xi (t) G M.^ is the D-dimensional coordinate of the point on the boundary of the i* structure. We assume that all / random 
variables Xi belong to a single group of shapes, with the group's shape model given by variables {Mi, Ci} where Mi is the mean 
shape and C\ the covariance of the shapes in the group. 

Similarly, we have other groups of shapes modeled by analogous random variables, e.g., group Y ^ {Yj}j=i^,,,,j with group 
mean and covariance {M2, C2}, group {Zk}k=i,....K with group mean and covariance {M3, C3}, etc. In order to be able to perform 
shape classification or hypothesis testing between groups, we ensure that the shape models lie in the same space by enforcing the 
same number of points T in all group shape means, i.e.. Mi = {Mi(t)}t^i j^, M2 = {M2{t)}t=i^...^T7 etc. all have the same 
number of points T. Thus, Mi e K^^, M2 S K^^, M3 e K^^, \/i X, G K^^, Vj Yj e M^^, andVfc Zk € K^^. Furthermore, we 
assume that all groups belong to a single population of shapes with shape parameters {M, C} that represent the mean and covariance 
on the group means Mi, AI2, etc. 

In the rest of the formulation, without loss of generality, we consider two groups for simplicity. For each group of shapes, we 
need to model the joint PDF comprising the population model variables M, C, group-level model variables Mi, Ci, M2, C2, and 
the data X, Y, i.e., P{M, C, Mi, Ci, M2, C2,X, Y). Given observed data x, y that comprises sets of shapes within 2 groups, the 
problem addressed in this paper is to fit the aforementioned hierarchical multigroup shape model to the data and subsequently perform 
hypothesis testing or classification. 

3 Inference via Monte-Carlo Expectation Maximization 

We solve the multigroup shape-fitting problem using the EM framework|6|. Figure [T] gives an overview of the approach. 

For the first group, i.e., corresponding to X, we assume a hidden random variable U = {t/i}i=i,...,7 where each (vector) random 
variable Ui represents the Kendall shape corresponding to the observed data Xi. The number of points in Ui is the same as those in 
Ml, i.e., T points. Similarly, we assume a hidden variable V for the second group. We also consider the group means Mi, M2 as 
hidden (Kendall-shape) variables. The parameters in our model include the group-specific covariances Ci, C2, the population mean 
M and the population covariance C. We introduce parameters /3i, /32, one per group, to control the smoothness of the hidden shape 
variables Ui,Vj. In this paper, (i) ^1,(^2 are fixed to 1 and (ii) smoothness parameters are ignored for the group means because of 
their limited practical utility. Let 6 = {Ci,C2,M, C} be the set of parameters to be estimated. Then, the inference problem is: 

maxP(x, y\9) = 


max j P[U,V,Mi,M2,x,y\9)dudvdmidm2 (1) 
In the iteration of EM optimization, with the parameter estimate 9^, the E step constructs the Q function 

Q{9\e') ^ 

Ep(u.y,M,.M.W,yM) log ^(f^' ^' Mi,M2,x, y\e). (2) 
which, because of the intractability of the expectation, we approximate using Monte-Carlo simulation as follows: 

s 1 

Q{9\e') « Q{9\e') ^ ^ -logF(^.^l;^TO^,m^,x,y|0), 

where (u'*, w", m^, m^) - P{U, V, Mi,M2\x, y, 9') (3) 

and where shapes u, v, which correspond to each observed sets of boundary points X, Y, as well as shapes rrii , m2, which correspond 
to the group means, are sampled from the posterior P{U, V, Mi, M2\x, y, 6"^). 

In the EM framework, with the hidden variables U, V, we model the joint PDF comprising the population shape variables, per- 
group shape variables, per-individual shape variables, and the data as: 

P{M,C, Ml , Ci , M2 ,C2,U, V, X, Y) ^ 
P{M, C, Ci , C2)P{Mi\M, C)P{M2\M, C) 

ni^iP{u,\Mi,Ci,Pi)p{x,\u,) 

n/^iP(y, IM2, C2,p2)PiYj\V,), where (4) 



• P{M, C, Ci, C2) is a prior on the population shape mean and covariance and the per-group shape covariances (ignored in this 
paper). 

• P{Mi\M, C), P{M2\M, C) are modelled as Gaussians with mean M and covariance C. 

• We design P(C/i|Mi, Ci, /3i) (and similarly -P(V^ IM2, C2, /32)) so as to penalize (i) the Mahalanobis distance of Ui under 
Ml, Ci as well as the (ii) deviation of every individual point Ui{t) given its neighbors. For structures in 2-dimensional images 
(13 = 2 in this paper) whose boundaries are planar closed curves, the neighborhood system underlying shape variable Ui 
(which comprises points {Ui{ty}) assigns two neighbors to each point: i.e., neighbors of t : V2 < t < (T — 1) are {t — 1) and 
{t + 1), neighbors of t = 1 are t = T and t = 2, and neighbors of i = T are t = (T — 1) and t = 1. Given this neighborhood 
system A/" = {(ii,t2) : ^1,^2 are neighbors }, 

P(C/,|Afi,Ci,/3i)^ 
I exp - 0.5{U, - Ah)'C^\U, - M,) 

\\U^{tl) ~ U,{t2)\\A (5) 
(ti,t2)eAA ^ 

where /3i controls the level of smoothness (f3i = /?2 = 1 in this paper) and Z is the partition function. 

• P{Xi\Ui) , P{Yj\Vj) are modelled using the current norm as follows. Consider two pointsets A = {ai}i=i^,,,j and B = 
{bj}j=i,...j. Given a kernel similarity function A' (•), which is Gaussian in this paper, the squared current norm between A 
and B is: 

i"=i 

J J I J 

^ ^ K{b,, , by. ) - 2 ^ ^ (a„ fe, ) . (6) 

We exploit the current norm to define a PDF on data shapes Xi given shape variables Ui as follows: 

P(X,|C/0 A i exp ( - dl{X,, uS) , (7) 

where 7 is a suitable normalization factor. Note: this allows the number of points in the shape models to be different from the 
number of observed boundary points. 



3.1 E Step: Sampling in Shape Space using Hamiltonian Monte Carlo 

In EM optimization, the E step requires sampling the hidden variables U, V, Mi, M2 from the posterior PDF P{U, V, Mi,M2\x, y,0^). 
We employ the Hamiltonian Monte Carlo procedurellSl lfT6l IfTTl for this sampling. HMC is a highly efficient sampling technique that 
belongs to the class of Markov-chain Monte-Carlo samplers. HMC exploits the gradient of the energy function (i.e., log P{U, V, Mi, M2\x, y, 9^), 
in our case) for fast exploration of the space of the hidden random variables. The key idea underlying HMC is to first augment the 
original variables with auxiliary momentum variables, then define a Hamiltonian function combining the original and auxiliary 
variables, and, subsequently, alternate between simple updates for these auxiliary momentum variables and Metropolis updates for 
the original variables. HMC proposes new states by computing a trajectory according Hamiltonian dynamics implemented with a 
leapfrog method. Such new states can be very distant from the current states, thereby leading to a fast exploration of the state space. 
Furthermore, Hamiltonian dynamics theoretically guarantees that the new proposal states are accepted with high probability. We 
sample the set of hidden variables U, V, Mi, AI2 by successively sampling the variables {Ui], {Vi], {Mi}, {M2}, following a Gibbs 
sampling technique, where each variable is sampled using HMC. The HMC procedure requires gradients of the energy function 
\og P{U, V, Mi,M2\x, y,e'-) with respect to the hidden variables {Ui},{Vj}, Afi, M2. The gradient of log P{U,V, Mi, M2\x,y,e'') 
with respect to the group mean shape variables Mi , M2 as well as the shape variables Ui , Vj is straightforward. 

Sampling in Kendall Shape Space using HMC: In general, the gradient of the log posterior for each point t in the shape Ui 
can change the translation, scale, and pose/rotation for the pointset. Specifically, the smoothness penalty on the shape variables, 
e.g., Ui, through f3i, tends to move the point Ui{t), to the mean of its neighbors, thereby shrinking the pointset. These effects 
are undesirable and significantly reduce the efficiency of the shape-sampling scheme by preventing sampling restricted to Kendall 
shape space. To avoid this effect, we propose a replace the gradient of the log posterior, within HMC, by a projected gradient that 
restricts the updated shape to Kendall shape space by removing effects of translation, scale, and rotation from the HMC update. 
Figure 12] illustrates this process. Without loss of generality, we assume that (i) u,; is centered, i.e., Ui{t) = 0, and thus lies on a 
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Figure 2: Sampling in Kendall Shape Space via Gradient Projection within HMC. Top: shows an illustration of Kendall's pre- 
shape spacefill (dotted hypersphere), which is the intersection of the (bold) hypersphere of fixed radius p (i.e., X]tll"i(^)lP ~ P^' 
fixes scale) and the hyperplane through the origin (i.e., Ui{t) = 0; fixes translation). For a pointset Ui, log-posterior gradients 

are projected onto the hyperplane to remove translation effects from the HMC update. Bottom: The resulting projection 
is then projected onto the tangent space at Ui, on Kendall pre-shape space (dotted hypersphere), to remove scale changes. The 
resulting tangent-space projection is mapped to the pre-shape space via the manifold exponential map. The resulting pre-shape 
is rotationally aligned with the original shape Ui, yielding (not shown in figure). These steps project the log-posterior gradient at 
Ui, within HMC, to generate a new sample point (not shown in figure) in Kendall shape space. 



hyperplane, (ii) Ui has size p, i.e., X]tll"j(^)lP ~ P^' '^hus lies on a hypersphere and (iii) Ui has a certain pose. Kendall pre- 
shape space|12| is the hypersphere (S^^~^ with radius p) that is the intersection of the hypersphere S'^^ and the aforementioned 
hyperplane (e M^^) that passes through the origin. First, we decompose the gradient for Ui into 3 components: (i) a component 
orthogonal to the hyperplane (this causes translation of Ui from its current location), (ii) a component within the hyperplane but 
orthogonal to the hypersphere (this causes changes in the size of Ui), and (iii) a component within the hyperplane and tangent to the 
hypersphere. Subsequently, we project the gradient vector to reduce it to the third component and then take the manifold exponential 
map on the hypersphere. Lastly, to remove effects of rotation, we rotationally align the resulting pointset in pre-shape space to the 
original pointset Ui. This gives the updated Ui in Kendall shape space. 

Sampling in Hierarchical Shape Model: We generate a sample of size S as follows (S = 100 in this paper): 

1. Set the sample index variable s to 0. Initialize the sampling procedure with the sample point s = denoted by = = 



Figure 3: Simulated Box-Bump Shapes| 13 |. Top Row: Simulated ground-truth shape models for the 2 groups that each comprise 
4 shapes. The bump for the first group (blue) is on the left while the bump for the second group (red) is on the right. Each shape has 
64 points shown by circles. The black filled circle indicates the first point in the list; other points are numbered counter-clockwise. 
Bottom Row: Corrupted data where the point ordering in each shape is randomly circularly shifted (to induce poor correspondences) 
and independent Gaussian noise is added to each point position (to mimic errors in boundary detection). 



{v^}, nil, ^2- Given sample point s, sample the (s + 1)* sample point as follows. 

2. Vi sample u^^^ ^ P{Ui,v'' ,ml,m2\x, y, 9^) via projected-gradient HMC. 

3. Vj sample ^ P{u^^^ ,Vj,m\,m2\x,y,9^) via projected-gradient HMC. 

4. Sample m^^^ ^ v^^^, Mi, TO||a:, y, 9^) via projected-gradient HMC. 

5. Sample mf^^^ ^ m^+\ M2|a;, y, 9^) via projected-gradient HMC. 

6. If s + 1 = S, then stop; otherwise increment s by 1 and repeat the sampling steps. 

To ensure the generation of independent samples between Gibbs iteration s and the next s + 1, we run the HMC procedure sufficiently 
long. Furthermore, as required in Gibbs sampling, we ensure independent samples between EM iteration i and the next i + 1 by 
burning in and discarding the first few sample points s for use in the M step. 

3.2 M Step: Optimizing Parameters 

In the EM optimization, at the i* iteration, the M step entails maximization of the Q function Q{9\9^) with respect to the param- 
eters 9 and sets ^ argmaxg X]f=i log^C""*; ^ "^ii "^21 v\^)- Subsequently, we get optimal values in closed form for the 
parameters C{+^ , M'+\ C'+^: 

^i""' = ^ E E - - ® 

s=l t=l 

^2^' = :^ E E (("I - ^^Wj - "^2)') (9) 



1 s 

M''-'^^Y.^rnl+m^2) (10) 



s=l 
S 



(m^-M)(m^-Af)'). (11) 



4 Validating the Hierarchical Shape Model and Inference on Simulated Shapes 

This section provides an example of a standard simulated test dataset lfT3l and shows that the proposed framework is able to correctly 
learn the true group and population models. Figure [3] (top row) shows simulated (ground-truth) pointsets, which are very similar 
to those used infTT], where the population is a collection of box-bumpf\3] shapes where the location of the bump (and each point) 
exhibits linear variation, across the group, over the top of the box. While the desired population mean AI corresponds to a shape 
with the bump exactly in the middle, the bumps in the true group means Mi , M2 are located symmetrically on either side of the 
middle. The true covariance matrices for the groups (Ci, C2) and the population (C) have a single non-zero eigen value. Figure [3] 
(bottom row) shows the observed corrupted data, i.e., {x^jf^i, {y^lj^i, where we induce poor correspondences and noise in the 
point locations. 



Figure 4: Left: shows the optimal population mean M (black dots) along with the expected values for the group means Mi (blue 
dots) and M2 (red dots) after EM inference. Middle: shows the correspondence of points for the corrupted data Xi across a selected 
group (blue shapes). Right: shows the correspondences for the expected values of shape models Ui after EM inference. 
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Figure 5: 3 Graplis on Left: show the largest 5 eigen values of the covariance matrices Ci, C, and C2, respectively, before and after 
EM optimization. Riglitmost: shows the histogram of Hotelling's statistics obtained by permutation testing (all ^6*4/2 = 35 
unique group-label permutations) and the statistic for the non-permuted groups (red dot). 



Figure |4] shows the means for the groups and population after EM optimization. We see that the estimated population mean M 
and the expected values of the group means Mi, M2 after EM optimization are close to their true values. Figure |4] also shows that 
the correspondence of corrupted data points Xiit), across the group (i — 1, . . . , 4), is poor, indicating a large variance. On the other 
hand, after EM optimization, the correspondence of the expected values of the shape variables Ui{t) indicates a significantly lower 
variance and correctly shows the linear variation in the point location across the group. 

Figure [5] shows the covariances for the groups and population after EM optimization. We see that the group covariances Ci, C2 
and the population covariance C after EM optimization indicate lower variance and a single non-zero eigen value, agreeing with the 
underlying true model. 

5 Results of Hierarchical Shape Model and Inference on a Leaf Database 

This section shows results of the proposed hierarchical multigroup shape modeling and inference on a real dataset, namely the Tree 
Leaf Databasel 14 1 comprising images of leaves of 90 wood species growing in the Czech Republic. We used 2 of the largest available 
groups comprising the species (i) Carpinus betulus having 23 leaf images (Figure |6j blue group) and (ii) Fagus sylvatica also having 
23 leaf images (Figure[6} red group). The leaf stem was removed from the images manually. Interestingly, while the blue group has 
oblong leaves that have high curvature at one end and low at the other, the red group has oblong leaves that are more symmetric with 
similar curvatures at both ends. 

Preprocessing: All leaf data pointsets {x^}, {j/j} initially undergo Procrustes alignmentlD to remove the effects of translation, 
scale, and rotation. 

Figure [6] also shows the population shape mean (black) and the expected values of the group shape means (blue, red). The 
expected values of the shapes for the 2 groups clearly show the aforementioned subtle difference in curvatures in the leaves of the 2 
species. Figure |7] shows the covariances for the leaf groups and population after EM optimization. We see that the group covariances 
Ci , C2 and the population covariance C after EM optimization indicate lower variance. 



6 Application I — Hypothesis Testing 

After the multi-group model is fit to the data, we can perform hypothesis testing to compare any pair of groups. Since the distribution 
of each group is modeled by a Gaussian, e.g. G(-; Afi, Ci), G{-\ AI2, C2), we use Hotelling's two-sample statistic to measure 
dissimilarity between any pair of groups. Conventionally, the p value associated with the hypothesis test is obtained by exploiting 
the relationship of the statistic to the F distribution with a certain degrees of freedom depending on the sample sizes (e.g. /, J, in 
our case) and the dimensionality (TD). In our case, however, the sample sizes (/ + J) can be far lower than the dimensionality TD, 
which prevents the use of the F distribution for computing p values. Thus, we propose permutation testing, using the as the test 
statistic, for hypothesis testing to overcome the the problem of low sample size. 



Figure 6: Tree Leaf Database. Top: shows the 2 groups of leaves from 2 different species of trees; one in red and the other in 
blue. Bottom: shows the expected values for the group-mean shape variables Mi (blue dots) and M2 (red dots) and the optimal 
population mean parameter M (black dots), after EM inference. 



6.1 Validation on Simulated Data 



For the simulated box-bump dataset, Figure [5]shows the expected result of permutation testing (Hq: the 2 group models are the same) 
indicating the the non-permuted labeling produces the most extreme statistic (p value — 1/35 = 0.0286) thereby leaning significantly 
towards rejection of the null hypothesis underlying permutation testing. 
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Figure 7: Left Column, (Top-to-Bottom): show the largest 5 eigen values of the covariance matrices Ci and C2, respectively, 
before and after EM optimization. Right Column, Top: show the largest 5 eigen values of the covariance matrices C, before and 
after EM optimization. Right Column, Bottom: shows the histogram of Hotelling's statistics obtained by permutation testing 
(200 randomly-chosen group-label permutations) and the statistic for the non-permuted groups (red dot). 



6.2 Results on a Database of Leaves 

Figure |6] shows the 2 leaf groups that presents a challenge for hypothesis testing because the differences between the leaves are 
subtle. Figure |7] shows the results of permutation testing indicating the the non-permuted labeling produces the most extreme statistic 
(p value — 0.005) thereby leaning significantly towards rejection of the null hypothesis underlying permutation testing. This is the 
desired result because the two leaf groups were known to correspond to two different tree species. 



7 Application II — Classification 

After the multi-group model is fit to training data, we can classify unseen shapes as follows. The test pointset is first Procrustes aligned 
to the population mean yielding a pointset, say, z. Let the hidden random variable corresponding to the test shape be w. Then, we 
evaluate the probability of the unseen shape z being drawn from the each group model, i.e., P{z\Mi, Ci, /3i) and P{z\M2, C2, 



and classify z to the class that yields a higher probabihty. We can evaluate the aforementioned probabihties as follows: 

P(z|Mi,Ci,/3i) = / P(z,u;|Mi,Ci,/3i)dz« 
P{z\'w, Mi,CuPi)P{w\h'h,Ci,Pi)dw 

« V \p{z\w') where ~ P(W^|Mi, Ci, (12) 

7.1 Results on a Database of Leaves 

We performed the classification task by training using only 3 leaves from each group and testing on the remaining 20 leaves in each 
group. We obtained a correct-classification rate of 97.5%. 

8 Discussion and Conclusion 

This paper propose a novel hierarchical graphical generative statistical model for modeling shapes (a shape is defined as an equiva- 
lence class of pointsetsi 12J) within a population for modeling multiple groups within a population. Model inference proceeds using 
the EM framework where the individual shape variables as well as the group-mean shape variables are treated as hidden random 
variables. In the E step, the approximation of the Q function using Monte-Carlo simulation entails sampling in Kendall shape space 
for which we propose a novel method that incorporates the (i) efficient HMC sampling coupled with (ii) gradient projection on shape 
manifolds. We validate the proposed modeling and inference scheme on simulated box-bump shapes|13 | and exploit the framework 
in two different applications, namely hypothesis testing and classification, where we obtain very promising results on a difficult real 
database. The framework could be extended for unsupervised learning of classes of shapes. 

The proposed hierarchical graphical model is not a simple Gaussian graphical model because of (i) smoothness penalty on 
individual shape variables as well as (ii) the (nonlinear) current norm used to model the joint distribution of the observed pointset and 
the associated hidden shape variable. In this way, the proposed framework solves a non-trivial inference problem. 

This paper does not model dependencies of the per-group covariances Ci , C2 on appropriate population-level parameters. It also 
ignores smoothness parameters on the group means because of the limited practical use of these parameters given many observed 
datasets per group. These limitations can lead to areas for future work. Nevertheless, we believe that the paper still makes important 
contributions. 
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